12 research outputs found
Towards Harmful Erotic Content Detection through Coreference-Driven Contextual Analysis
Adult content detection still poses a great challenge for automation.
Existing classifiers primarily focus on distinguishing between erotic and
non-erotic texts. However, they often need more nuance in assessing the
potential harm. Unfortunately, the content of this nature falls beyond the
reach of generative models due to its potentially harmful nature. Ethical
restrictions prohibit large language models (LLMs) from analyzing and
classifying harmful erotics, let alone generating them to create synthetic
datasets for other neural models. In such instances where data is scarce and
challenging, a thorough analysis of the structure of such texts rather than a
large model may offer a viable solution. Especially given that harmful erotic
narratives, despite appearing similar to harmless ones, usually reveal their
harmful nature first through contextual information hidden in the non-sexual
parts of the narrative.
This paper introduces a hybrid neural and rule-based context-aware system
that leverages coreference resolution to identify harmful contextual cues in
erotic content. Collaborating with professional moderators, we compiled a
dataset and developed a classifier capable of distinguishing harmful from
non-harmful erotic content. Our hybrid model, tested on Polish text,
demonstrates a promising accuracy of 84% and a recall of 80%. Models based on
RoBERTa and Longformer without explicit usage of coreference chains achieved
significantly weaker results, underscoring the importance of coreference
resolution in detecting such nuanced content as harmful erotics. This approach
also offers the potential for enhanced visual explainability, supporting
moderators in evaluating predictions and taking necessary actions to address
harmful content.Comment: Accepted for 6th Workshop on Computational Models of Reference,
Anaphora and Coreference at EMNLP 2023 Conferenc
Polish Ashbery back in America?
John Ashbery’s poetry was introduced to Polish readers for the fi rst time in 1986, in the now legendary "blue" issue of the world literature review "Literatura na Świecie". Although the American poet spoke vicariously through the voice of Piotr Sommer’s translatory ventriloquism, and later also via other translators’ voices, it is Andrzej Sosnowski who is typically perceived as the ambassador of Ashbery in Poland. He has been popularizing Ashbery’s poetry by translations and criticism, and - what is more important - also by publishing his own poetry, which has grown through Ashberian verse. Still, easy parallels should be avoided, especially if one considers translations of Sosnowski’s poetry into English. A discussion of a rendition by Rod Mengham, an American poet who translates in cooperation with the author, raises interesting questions of the limits and hermeneutics of translation and mediation of authorial intention, form and language. Can this poetry, so deeply rooted in the American tradition, resist translation into English? This article attempts to answer some of the questions by studying the processes and (im)possibilities of translation of Sosnowski’s poetry
The Grammar and Syntax Based Corpus Analysis Tool For The Ukrainian Language
This paper provides an overview of a text mining tool the StyloMetrix
developed initially for the Polish language and further extended for English
and recently for Ukrainian. The StyloMetrix is built upon various metrics
crafted manually by computational linguists and researchers from literary
studies to analyze grammatical, stylistic, and syntactic patterns. The idea of
constructing the statistical evaluation of syntactic and grammar features is
straightforward and familiar for the languages like English, Spanish, German,
and others; it is yet to be developed for low-resource languages like
Ukrainian. We describe the StyloMetrix pipeline and provide some experiments
with this tool for the text classification task. We also describe our package's
main limitations and the metrics' evaluation procedure
Polski Ashbery w Ameryce
Polish Ashbery Back in America?John Ashbery’s poetry was introduced to Polish readers for the fi rst time in 1986, in the now legendary “blue” issue of the world literature review “Literatura na Świecie”. Although the American poet spoke vicariously through the voice of Piotr Sommer’s translatory ventriloquism, and later also via other translators’ voices, it is Andrzej Sosnowski who is typically perceived as the ambassador of Ashbery in Poland. He has been popularizing Ashbery’s poetry by translations and criticism, and – what is more important – also by publishing his own poetry, which has grown through Ashberian verse. Still, easy parallels should be avoided, especially if one considers translations of Sosnowski’s poetry into English. A discussion of a rendition by Rod Mengham, an American poet who translates in cooperation with the author, raises interesting questions of the limits and hermeneutics of translation and mediation of authorial intention, form and language. Can this poetry, so deeply rooted in the American tradition, resist translation into English? This article attempts to answer some of the questions by studying the processes and (im)possibilities of translation of Sosnowski’s poetry
StyloMetrix: An Open-Source Multilingual Tool for Representing Stylometric Vectors
This work aims to provide an overview on the open-source multilanguage tool
called StyloMetrix. It offers stylometric text representations that cover
various aspects of grammar, syntax and lexicon. StyloMetrix covers four
languages: Polish as the primary language, English, Ukrainian and Russian. The
normalized output of each feature can become a fruitful course for machine
learning models and a valuable addition to the embeddings layer for any deep
learning algorithm. We strive to provide a concise, but exhaustive overview on
the application of the StyloMetrix vectors as well as explain the sets of the
developed linguistic features. The experiments have shown promising results in
supervised content classification with simple algorithms as Random Forest
Classifier, Voting Classifier, Logistic Regression and others. The deep
learning assessments have unveiled the usefulness of the StyloMetrix vectors at
enhancing an embedding layer extracted from Transformer architectures. The
StyloMetrix has proven itself to be a formidable source for the machine
learning and deep learning algorithms to execute different classification
tasks.Comment: 26 pages, 6 figures, pre-print for the conferenc
BAN-PL: a Novel Polish Dataset of Banned Harmful and Offensive Content from Wykop.pl web service
Advances in automated detection of offensive language online, including hate
speech and cyberbullying, require improved access to publicly available
datasets comprising social media content. In this paper, we introduce BAN-PL,
the first open dataset in the Polish language that encompasses texts flagged as
harmful and subsequently removed by professional moderators. The dataset
encompasses a total of 691,662 pieces of content from a popular social
networking service, Wykop.pl, often referred to as the "Polish Reddit",
including both posts and comments, and is evenly distributed into two distinct
classes: "harmful" and "neutral". We provide a comprehensive description of the
data collection and preprocessing procedures, as well as highlight the
linguistic specificity of the data. The BAN-PL dataset, along with advanced
preprocessing scripts for, i.a., unmasking profanities, will be publicly
available
Translator toward the Other: tropes and signatures
Wydział Filologii Polskiej i Klasycznej: Zakład Teorii Literatury, Literatury XX Wieku i Sztuki PrzekładuPraca stanowi próbę zaproponowania nowego modelu krytyki i analizy przekładu. Metoda ta opiera się na tropologicznej teorii Douglasa Robinsona (opisanej w jego książce Translator’s Turn). Została ona poddana jednak daleko idącym modyfikacjom i rozszerzeniom. Pięć tropów (ironia, metonimia, synekdocha, hiperbola i metalepsis) odpowiada pięciu typom tłumaczy, a typy te charakteryzuje afektywna motywacja ich translatorskich działań, skierowana wobec Innego przekładu. Oprócz propozycji teoretycznej, praca zawiera analizy i interpretacje konkretnych przykładów tłumaczy Andrzeja Sosnowskiego, Roda Menghama, Doreen Daume, Jakuba Ekiera, grupy VERSATORIUM oraz Jana Gondowicza. Materiał przekładowy to natomiast poezja Andrzeja Sosnowskiego, Johna Ashbery’ego, Tkaczyszyna-Dyckiego, Charlesa Bernsteina, Reinera Kunze oraz proza Borysa Akunina.An effort to offer a new critical model of translation analysis has been made in this dissertation. The offered method is based on translation tropics – and idea by Douglas Robinson presented in his book Translator’s Turn – but it has been vastly modified and extended. Five tropes (irony, metonymy, synecdoche, hyperbole and metalepsis) describe five types of translators and their affective motivations while translational decision making: translator’s affects toward the Other of the source text and culture like Derridian “hostipitality” or the “abject” introduced by Butler. Beside the theoretical part there are also interpretations and analysis of particular translational cases to be found. They are based on works by Andrzej Sosnowski, John Ashbery, Tkaczyszyn-Dycki, Charles Bernstein, Reiner Kunze and Boris Akunin and follow the affective motivations of following translators: Rod Mengham, Doreen Daume, Jakub Ekier, VERSATORIUM and Jan Gondowicz
GAN and GPT-2 neural networks, worn words and creativity, namely literary second-hand
Czy kreatywność to wyłącznie domena człowieka? Czy sieć neuronowa, choćby najbardziej skomplikowanej architektury, nakarmiona materiałem stworzonym i wybranym przez człowieka może być kreatywna, a jeśli nawet, to czy jej dzieło nie będzie wobec ludzkiego wtórne? A może, jak chciał Bachtin, a za nim Kristeva, każda nasza wypowiedź i tak jest skazana na wtórność, bo taka jest natura języka? Czym jest kreatywność, co potrafi sztuczna inteligencja, do jakich refleksji krytycznoliterackich skłaniać może jej twórczość, szczególnie w kontekście relacji intertekstualnych, interpoetyckich? W artykule odpowiedzi szukam na przykładzie funkcjonowania sieci neuronowych typu GAN oraz modelu GPT-2. Oprócz fragmentów analizowanych tekstów i nawiązań do teorii literatury pojawia się również wprowadzenie do struktury i istoty omawianych rozwiązań technologicznych.Is creativity only a human domain? Can a neural network, even the most sophisticated architecture, fed with material created and chosen by man, be creative, and even if it is not a work of art secondary to human beings? Or maybe, as Bakhtin, and behind him Kristeva, wanted, each of our expressions is still destined to be secondary, because this is the nature of language? What is creativity, what can artificial intelligence do, what critical literary reflections can its work induce, especially in the context of intertextual and interpoetic relations? In the article I am searching for answers on the example of functioning of neural networks type GAN and GPT-2 model. Apart from fragments of analyzed texts and references to the theory of literature, there is also an introduction to the structure and essence of the analyzed technological solution
From Intersemiotic Translation to Tie-In Products, or Transmedial Storytelling as a Translation Strategy
Na przykładach literackich, filmowych, muzycznych a także tych wymykających się jednoznacznym gatunkowym klasyfikacjom w artykule przedstawione zostały translatorskie serie, rozumiane jako ciąg utworów (lub wręcz produktów) interpretujących oryginał (lub siebie nawzajem) za pomocą innych mediów. Tradycyjne ujęcie przywołujące przekład intersemiotyczny w przypadku każdego niejęzykowego przekładu zostało tu jednak wzbogacone o marketingowo-rynkowe motywacje takich translatorskich działań, zwane transmedialnym storytellingiem oraz strategią sprzedaży produktu tie-in.Using examples from literature, film, music, and some that elude unambiguous genre classification, the article presents translation series understood as groups of works (or rather products) interpreting an original work (or each other) using other media. The traditional notion of invoking intersemiotic translation in cases of non-linguistic translation is here enhanced to include the marketing and capitalistic motivations for a variety of translation operations that fall under the rubrics of transmedial storytelling and tie-in strategies for selling products
Far Beyond Google Translate: Natural Language Processing (NLP) in Translation and Translatology
Przewrotna jest rola postępu – im więcej technologicznego rozwoju, tym większy udział człowieka – w koncepcji, formułowaniu zadań, interpretacji wyników, nadzorze i korekcie. Hierarchia jest zachowana, człowiek wciąż nieodzowny, ale to nie znaczy, że w pewnych obszarach maszynowy potencjał rzeczywiście nie przewyższa ludzkiego i że nie warto z tej przewagi skorzystać. Przetwarzanie języka naturalnego (NLP) to dziedzina niemłoda, ale w ostatnich latach dzięki rozkwitowi metod uczenia głębokiego (deep learning), mody na maszynowe wnioskowanie (data/knowledge mining) czy nowym sprzętowym interfejsom (m.in. zaawansowane rozpoznawanie obrazu) komputerowa analiza tekstu przeżywa istny renesans. W odniesieniu do translacji przyjęło się mówić i pisać głównie o coraz doskonalszych lub właśnie zupełnie niemożliwych algorytmach dla kolejnych par języków czy coraz większej precyzji samego tłumaczenia. Niniejszy artykuł przedstawia natomiast nieco szersze spektrum procesu tłumaczenia i przygląda się elementom przekładowi towarzyszącym (jak choćby krytyka), w których wykorzystanie metod NLP możeprzynieść nowe, ciekawe wyniki. Wyniki, których ze względu na ograniczoną moc obliczeniową człowiek nie jest w stanie osiągnąć. Omówione zostały takie aspekty jak wektorowa reprezentacja języka, stylometria i jej zastosowania czy analiza wielkich zbiorów danych – wszystko to na potrzeby szeroko rozumianychtranslacji i translatologii.The more technological development, the greater the participation of the human – in formulating tasks and problems, supervising and improving automated processes and interpreting their outcomes. The hierarchy is preserved, humans are still indispensable, but it does not mean that in certain areas of machinery the potential does not really exceed that of the human and that this advantage is not worth exploiting. Natural language processing (NLP) is not a young field, but in recent years, thanks to the thrive of deep learning methods, data and knowledge mining or new human-machine interfaces, computer text analysis is experiencing a real renaissance. As far as translation is concerned, it is mostly algorithms for machine translation that are being discussed. This article, on the other hand, presents a slightly broader spectrum of the translation process and looks at the accompanying elements (such as criticism) in which the use of NLP methods may bring new and interesting results. Results which, due to limited computing power, humans are unable to achieve. The discussion in the paper covers such aspects as the vector representation of language,stylometry and its application, or the analysis of large data sets – all for the purposes of translation and translatology